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Abstract. The enumeration of minimal unsatisfiable subsets (MUSes) finds a 
growing number of practical applications, that includes a wide range of diagno¬ 
sis problems. As a concrete example, the problem of axiom pinpointing in the 
£C family of description logics (DLs) can be modeled as the enumeration of the 
group-MUSes of Horn formulae. In turn, axiom pinpointing for the 8 C, family 
of DLs finds important applications, such as debugging medical ontologies, of 
which SNOMED CT is the best known example. The main contribution of this 
paper is to develop an efficient group-MUS enumerator for Horn formulae, Hg- 
MUS, that finds immediate application in axiom pinpointing for the 8C family of 
DLs. In the process of developing HgMUS, the paper also identifies performance 
bottlenecks of existing solutions. The new algorithm is shown to outperform all 
alternative approaches when the problem domain targeted by group-MUS enu¬ 
meration of Horn formulae is axiom pinpointing for the 8C family of DLs, with 
a representative suite of examples taken from different medical ontologies. 


1 Introduction 

Description Logics (DLs) are well-known knowledge representation formalisms [4]. 
DLs find a wide range of applications in computer science, including the semantic web 
and representation of ontologies, but also in medical bioinformatics. Axiom pinpoint¬ 
ing represents the problem of computing one minimal axiom set (denoted MinA), which 
explains a subsumption relation in an ontology [49]. Example applications of axiom 
pinpointing include context-based reasoning, error-tolerant reasoning [33], and ontol¬ 
ogy debugging and revision [50,27]. Axiom pinpointing for different description logics 
(DLs) has been studied extensively for more than a decade, with related work in the 
mid 90s [3,49,43,38,5,7,26,54,50,8,51,6,40,41,32,52,1,34]. 

The £C family of DLs is well-known for being tractable. Despite being inexpres¬ 
sive, the ££ family of DLs, concretely by using the more expressive £C^, has been 
used for representing ontologies in the medical sciences, including the well-known 
SNOMED CT ontology [56]. Work on axiom pinpointing for the £C family of DLs can 
be traced to 2006, namely the CEL tool [5]. Later, in 2009, the use of SAT was proposed 
for axiom pinpointing in the £C family of DLs [51,57,52], concretely for the more ex¬ 
pressive description logic This seminal work proposed a propositional Horn en¬ 
coding that can be exponentially smaller than earlier work [5,7,8]. Moreover, the use of 
SAT for axiom pinpointing for the £C family of DLs, named EL'^SAT [51,57,52], was 


shown to consistently outperform earlier work, concretely CEL [5]. Recent work [1] 
proposes the EL2MCS tool that builds on these propositional encodings, but exploits 
the relationship between axiom pinpointing and MUS enumeration; concretely, it relies 
on explicit hitting set dualization [31]. This tool is evaluated in [1], where it is shown 
to achieve conclusive performance gains over earlier work. The relationship between 
axiom pinpointing and MUS enumeration was also studied elsewhere [34]. Instead of 
exploiting hitting set dualization, this alternative approach exploits the enumeration of 
prime implicants [34] . 

The main contribution of this paper is to develop an efficient group-MUS enumer¬ 
ator for Horn formulae, referred to as HgMUS, that hnds immediate application in 
axiom pinpointing for the EC, family of DLs. In the process of developing HgMUS, 
the paper also identifies performance bottlenecks of existing solutions, in particular 
EL'^SAT [51,52]. The new group-MUS enumerator for Horn formulae builds on the 
large body of recent work on problem solving with SAT oracles. This includes, among 
others, MUS extraction [12], MCS extraction and enumeration [35], and partial MUS 
enumeration [45,29,30]. HgMUS also exploits earlier work on solving Horn proposi¬ 
tional formulae [18,39], and develops novel algorithms for MUS extraction in proposi¬ 
tional Horn formulae. The experimental results, using well-known problem instances, 
demonstrate conclusive performance improvements over all other existing approaches, 
in most cases by several orders of magnitude. 

The paper is organized as follows. Section 2 introduces the notation and definitions 
used throughout the paper. Section 3 reviews recent work on MUS enumeration, which 
serves as the basis for HgMUS. Afterwards, the new group-MUS enumerator HgMUS 
is described in Section 4. Section 5 compares HgMUS with existing alternatives. Ex¬ 
perimental results on well-known problem instances from axiom pinpointing for the EC 
family of DLs are analyzed in Section 6. The paper concludes in Section 7. 


2 Preliminaries 


We assume familiarity with propositional logic [13] and consider propositional Boolean 
formulae in Conjunctive Normal Eorm (CNE). A CNE formula T is defined over a set 
of Boolean variables ViT') = {xi, ...jXn} as a conjunction of clauses (ci A ... A Cm)- 
A clause c is a disjunction of literals {li V ... V Ik) and a literal I is either a variable x 
or its negation —•x. We refer to the set of literals appearing in as L{F). Eormulae can 
be alternatively represented as sets of clauses, and clauses as sets of literals. 

A truth assignment, or interpretation, is a mapping : X ^ {0,1}, with X = 
V{J-) also used to represent the variables of X. If all the variables in X are assigned a 
truth value, /i is referred to as a complete assignment. Interpretations can be also seen 
as conjunctions or sets of literals. Truth valuations are lifted to clauses and formulae as 
follows: p, satishes a clause c if it contains at least one of its literals. Given a formula X, 
p satisfies X (written ^ 1= if it satisfies all its clauses, being p referred to as a model 
ofX. 

Given two formulae X and Q, X entails Q (written X\=Q) iff all the models of X 
are also models of and Q are equivalent (written X = Q) iff X\^Q and Q\^ X. 
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A formula T is satisfiable (T _L) if there exists a model for it. Otherwise it is 
unsatisfiable (T N _L). SAT is the decision problem of determining the satisfiability of a 
propositional formula. This problem is in general NP-complete [16]. 

Some applications require computing certain types of models. In this paper, we will 
make use of maximal models, i.e. models such that a set-wise maximal subset of the 
variables are assigned value 1: 

Definition 1. (MxM). Let F be a satisfiable propositional formula, p,\^ T a model of 
T and PCX the set of variables appearing in p, with positive polarity, p is a maximal 
model (MxM) of F iff F U P\^ _L and for all v € X \ P, F U P U {u} 1= _L. 

Herein, we will denote a maximal model by P, i.e. the set of its positive literals. 
Horn formulae constitute an important subclass of propositional logic. These are 
composed of Horn clauses, which have at most one positive literal. Satisfiability of 
Horn formulae is decidable in polynomial time [18,24,39]. 

Given an unsatisfiable formula F, the following subsets represent different notions 
regarding (set-wise) minimal unsatisfiability and maximal satisfiability [31,35]: 

Definition 2. (MUS). A4 C F is a Minimally Unsatisfiable Subset (MUS) of F iff M. 
is unsatisfiable andWc S Xi,Xi \ {c} is satisfiable. 

Definitions. (MCS). C C F is a Minimal Correction Subset (MCS) ijf F\C is satis¬ 
fiable and^c ^ C,F\{C\ {c}) is unsatisfiable. 

Definition 4. (MSS). S C F is a Maximal Satisfiable Subset (MSS) ijf S is satisfiable 
and € F \ S,S U {c} is unsatisfiable. 

An MSS is the complement of an MCS. MUSes and MCSes are closely related by 
the well-known hitting set duality [47,10,14,55]: Every MCS (MUS) is an irreducible 
hitting set of all MUSes (MCSes) of F. In the worst case, there can be an exponential 
number of MUSes and MCSes [31,42]. Besides, MCSes are related to the MaxSAT 
problem, which consists in finding an assignment satisfying as many clauses as possible. 
The smallest MCS (largest MSS) represents an optimal solution to MaxSAT. 

Motivated by several applications, MUSes and related concepts have been extended 
to CNF formulae where clauses are partitioned into disjoint sets called groups [31]. 

Definition 5. (Group-Oriented MUS). Given an explicitly partitioned unsatisfiable CNF 
formula F = GoC...L}Qk, o group-oriented MUS (or group-MUS) ofF is a set of groups 
G C {Gi ,..., Gk}, such that GoCG is unsatisfiable, and for every Gi G G, GoC {G \ Gi) 
is satisfiable. 

Note the special role Go {gfoup-Of this group consists of background clauses that 
are included in every group-MUS. Because of Go a group-MUS, as opposed to MUS, 
can be empty. Nevertheless, in this paper we assume that Go is satisfiable. 

Equivalently, the related concepts of group-MCS and group-MSS can be defined 
in the same way. We omit these definitions here due to lack of space. In the case of 
MaxSAT, the use of groups is investigated in detail in [23]. 
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Algorithm 1: eMUS [45] / MARCO [30] 
Input: T a CNF formula 


1 I ■(- {pi \ a G T} 

2 Q -s- 0 

3 while true do 

4 {st, P) ■(— MaximalModel(Q) 

5 if not st then return 

6 Pi — {Ci \ Pi G P} 

7 if not SAT (J^') then 

8 Ai i— ComputeMUS(J^') 

9 ReportMUS(Af) 

10 b ir- {~>pi I a £ M} 11 

11 else 

12 I b i^ {pt \ Pi £ I \ P} II 


II Variable pi picks clause d 


II Pick selected clauses 


Negative clause blocking the MUS 
Positive clause blocking the MCS 
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3 MUS Enumeration in Horn Formulae 

Enumeration of MUSes has been the subject of research that can be traced to the seminal 
work of Reiter [47]. A well-known family of algorithms uses (explicit) minimal hitting 
set dualization [14,10,31]. The organization of these algorithms can be summarized as 
follows. First compute all the MCSes of a CNF formula. Second, MUSes are obtained 
by computing the minimal hitting sets of the set of MCSes. The main drawback of ex¬ 
plicit minimal hitting set dualization is that, if the number of MCSes is exponentially 
large, these approaches will be unable to compute MUSes, even if the total number of 
MUSes is small. As a result, recent work considered what can be described as implicit 
minimal hitting set dualization [29,45,30]. In these approaches, either an MUS or an 
MCS is computed at each step of the algorithm, with the guarantee that one or more 
MUSes will be computed at the outset. In some settings, implicit minimal hitting set 
dualization is the only solution for finding some MUSes of a CNF formula. As pointed 
out in this recent work, implicit minimal hitting set dualization aims to complement, 
but not replace, the explicit dualization alternative, and in some settings where enumer¬ 
ation of MCSes is feasible, explicit minimal hitting set dualization may be the preferred 
option [45,30]. 

Algorithm 1 shows the eMUS enumeration algorithm [45], also used in the most 
recent version of MARCO [30]. It relies on a two-solver approach aimed at enumerating 
the MUSes/MCSes of an unsatisfiable formula P. On the one hand, a formula Q is used 
to enumerate subsets of P. This formula is defined over a set of variables I = {pi \ Ci £ 
P}, each one of them associated with one clause Ci £ P. Iteratively until Q becomes 
unsatisfiable, eMUS computes a maximal model P of Q and tests the satisfiability of 
the corresponding subformula P' C P. If it is satisfiable, P' represents an MSS of 
P, and the clause I \ P is added to Q, preventing the algorithm from generating any 
subset of the MSS (superset of the MCS) again. Otherwise, if P' is unsatisfiable, it is 
reduced to an MUS M., which is blocked adding to Q a clause made of the variables 
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in / associated with A4 with negative polarity. This way, no superset of JH will be 
generated. Algorithm 1 is guaranteed to find all MUSes and MCSes of in a number 
of iterations that corresponds to the sum of the number of MUSes and MCSes. 

This paper considers the concrete problem of enumerating the group-MUSes of an 
unsatisfiable Horn formula. As highlighted earlier, and as discussed later in the paper, 
enumeration of the group-MUSes of unsatisfiable Horn formulae finds important appli¬ 
cations in axiom pinpointing for the EC family of DLs, including EL'^. It should be 
observed that the difference between the enumeration of plain MUSes of Horn formu¬ 
lae and the enumeration of group-MUSes is significant. First, enumeration of group- 
MUSes of Horn formulae cannot be achieved in total polynomial time, unless P = NP. 
This is an immediate consequence from the fact that axiom pinpointing for the EC 
family of DLs cannot be achieved in total polynomial time, unless P = NP [7], and 
that axiom pinpointing for the EC family of DLs can be reduced in polynomial time 
to group-MUS enumeration of Horn formulae [1]. Second, enumeration of MUSes of 
Horn formulae can be achieved in total polynomial time (actually with polynomial de¬ 
lay) [44]. 

Given the above, a possible approach for enumerating group-MUSes of Horn for¬ 
mulae is to use an existing solution, either based on explicit or implicit minimal hitting 
set dualization. For example, the use of explicit minimal hitting dualization was re¬ 
cently proposed in [1]. Alternatively, either eMUS [45] or the different versions of 
MARCO [29,30] could be used, as also pointed out in [34]. 

This paper opts instead to exploit the implicit minimal hitting set dualization ap¬ 
proach [29,45,30], but develops a solution that is specific to the problem formulation. 
This solution is described in the next section. 


4 Algorithm for Group-MUS Enumeration in Horn Formulae 

This section describes HgMUS, a novel and efficient group-MUS enumerator for Horn 
formulae based on implicit minimal hitting set dualization. In this section, T-L denotes 
the group of clauses i e. the background clauses. Moreover, X denotes the set of 
(individual) groups of clauses, with I = ..., Qk}. So, the unsatisfiable group- 

Horn formula corresponds to T = HU X. Also, in this section, the formula Q shown 
in Algorithm 1 is defined on a set of variables associated to the groups in X. For the 
problem instances considered later in the paper (obtained from axiom pinpointing for 
the EC family of DLs), each group of clauses contains a single unit clause. However, 
the algorithm would work for arbitrary groups of clauses. 

4.1 Organization 

The high-level organization of HgMUS mimics that of eMUS/MARCO (see Algo¬ 
rithm 1), with a few essential differences. First, the satisfiability testing step (because it 
operates on Horn formulae) uses the dedicated linear time algorithm LTUR [39]. LTUR 
can be viewed as one-sided unit propagation, since only variables assigned value 1 are 
propagated. Moreover, the simplicity of LTUR enables very efficient implementations, 
that use adjacency lists for representing clauses instead of the now more commonly 
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Algorithm 2: Computation of Maximal Models 
Input: Q a CNF formula 

Output: (sf, P): with st a Boolean and P an MxM (if it exists) 

1 (P, U, B) ^ ({M I i L{Q)}, {{x} I € L(Q)}, 0) 

2 {st, P, U) t— InitialAssignment(Q U P) 

3 if not st then return (false, 0) 

4 while (7 7 ^ 0 do 

5 I ^ SelectLiteral(P) 

6 (sf,p) t-SAT(QUPUPU{(}) 

7 if sf then (P, P)UpdateSatClauses(/ 2 , P, P) 

8 [ else (P, E)^{U\ {i}, B U {^(}) 

9 return (true, P) // P is an MxM of Q 


used watched literals. Second, the problem formulation motivates using a dedicated 
MUS extraction algorithm, which is shown to be more effective in this concrete case 
than other well-known approaches [12]. Third, we also highlight important aspects of 
the eMUS/MARCO implicit minimal hitting set dualization approach, which we claim 
have been overlooked in earlier work [57,52]. 

4.2 Computing Maximal Models 

The use of maximal models for computing either MCSes of a formula or a set of clauses 
that contain an MUS was proposed in earlier work [45], which exploited SAT with 
preferences for computing maximal models [21,48]. The use of SAT with preferences 
for computing maximal models is also exploited in related work [51,52]. 

Computing maximal models of a formula Q can be reduced to the problem of ex¬ 
tracting an MSS of a formula Q' [35], where the clauses of Q are hard and, for each vari¬ 
able Xi G U(Q), it includes a unit soft clause Ci = {xi}. Also, recent work [35,22,9,37] 
has shown that state-of-the-art MCS/MSS computation approaches outperform SAT 
with preferences. HgMUS uses a dedicated algorithm based on the LinearSearch MCS 
extraction algorithm [35], due to its good performance in MCS enumeration. Since all 
soft clauses are unit, it can also be related with the novel Literal-Based extractor algo¬ 
rithm [37]. Shown in Algorithm 2, it relies on making successive calls to a SAT solver. 
It maintains three sets of literals: P, an under-approximation of an MxM (i.e. positive 
literals s.t. QU _L), B, with negative literals such that Q U PU {(} N _L (i.e. back¬ 
bone literals), and U, with the remaining set of positive literals to be tested. Initially, P 
and U are initialized from a model P (U) including the literals appearing with 

positive (negative) polarity in /i. Then, iteratively, it tries to extend P with a new literal 
I G U,hy testing the satisfiability of Q U P U P U {(}. If it is satisfiable, all the literals 
in U satisfied by the model (including 1) are moved to P. Otherwise, I is removed from 
U and is added to B. This algorithm has a query complexity of (!I(|U(Q)|). 

Algorithm 2 integrates a new technique, which consists in pre-initializating P with 
the pure positive literals appearing in Q and U with the remaining ones (line 1), and 
then requiring the literals of P to be satisfied by the initial assignment (line 2). It can 
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be easily proved that these pure literals are included in all MxMs of Q, so a num¬ 
ber of calls to the SAT solver could be avoided. Moreover, the SAT solver will never 
branch on these variables, easing the decision problems. This technique is expected to 
be effective in HgMUS. Note that, in this context, Q is made of two types of clauses: 
positive clauses blocking MCSes of the Horn formula, and negative clauses blocking 
MUSes. So, with this technique, the computation of MxMs is restricted to the variables 
representing groups appearing in some MUS of the Horn formula.' 

4.3 Adding Blocking Clauses 

One important aspect of HgMUS are the blocking clauses created and added to the 
formula Q (see Algorithm 1). These follow what was first proposed in eMUS [45] 
and MARCO [29,30]. For each MUS, the blocking clause consists of a set of negative 
literals, requiring at least one of the clauses in the MUS not to be included in future 
selected sets of clauses. For each MCS, the blocking clause consists of a set of positive 
literals, requiring at least one of the clauses in the MCS to be included in future selected 
sets of clauses. The way MCSes are handled is essential to prevent that MCS and sets 
containing the same MCS to be selected again. Although conceptually simple, it can be 
shown that existing approaches may not guarantee that supersets of MCSes (or subsets 
of the MSSes) are not selected. As argued later, this is the case with EL’^SAT [57,52]. 

4.4 Deciding Satisfiability of Horn Formulae 

It is well-known that Horn formulae can be decided in linear time [18,24,39]. HgMUS 
implements the LTUR algorithm [39]. There are important reasons for this choice. First, 
LTUR is expected to be more efficient than plain unit propagation, since only variables 
assigned value 1 need to be propagated. Second, most implementations of unit propa¬ 
gation in CDCL SAT solvers (i.e. that use watched literals) are not guaranteed to run in 
linear time [20]; this is for example the case will all implementations of Minisat [19] 
and its variants, for which unit propagation runs in worst-case quadratic time. As a re¬ 
sult, using an off-the-shelf SAT solver and exploiting only unit propagation (as is done 
for example in earlier work [51,52,34]) is unlikely to be the most efficient solution. 
Besides the advantages listed above, the use of a linear time algorithm for deciding 
the satisfiability of Horn formulae turns out to be instrumental for MUS extraction, as 
shown in the next section. In order to use LTUR for MUS extraction, an incremental 
version has been implemented, which allows for the incremental addition of clauses to 
the formula and incremental identification of variables assigned value 1. Clearly, the 
amortized run time of LTUR, after adding m = |J^| clauses, is 0(||J^||), with ||J^|| the 
number of literals appearing in T. 

4.5 MUS Extraction in Horn Formulae 

For arbitrary CNF formulae, a number of approaches exist for MUS extraction, with the 
most commonly used one being the deletion-based approach [11,12], but other alterna- 

’ SATPin [34] also exploits this insight of relevant variables, but not in the computation of 
MxMs, as SATPin does not compute MxMs. 


7 



Algorithm 3: Insertion-based [17] MUS extraction using LTUR [39] 


Input: H, denotes the Qo clauses; X, denotes the set of (individual) group clauses 
Output: A4, denotes the computed MUS 


1 {M,Cr) ^ {H,0) 

2 LTUR_prop(At,At) // Start by propagating Qo clauses 

3 while true do 

4 if c, > 0 then 

5 AtAt U {cr} // Add transition clause Cr to M 

6 if not LTUR_prop(Ad, {cr}) then 

7 LTUR_undo(At, At) 

8 return At \ // Remove Qo clauses from computed MUS 
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11 

12 

13 

14 

15 

16 


5^0 

while true do 

Cr ■«—SelectRemoveClause(ir) // Target transition clause 
5 1- 5 U {Cr} 

if not LTUR_prop(Ad U S, {cr}) then 

I5\ {cr} // Update working set of groups 

LTUR_undo(At,5) 

break // Cr represents a transition clause 


tives include the QuickXplain algorithm [25] and the more recent Progression algo¬ 
rithm [36]. It is also well-known and generally accepted that, due to its query complex¬ 
ity, the insertion-based algorithm [17] for MUS extraction is in practice not competitive 
with existing alternatives [12]. 

Somewhat surprisingly, this is not the case with Horn formulae when (an incre¬ 
mental implementation of) the LTUR algorithm is used. A modified insertion-based 
MUS extraction algorithm that exploits LTUR is shown in Algorithm 3. LTUR_prop 
propagates the consequences of adding some new set of clauses, given some exist¬ 
ing incremental context. LTUR_undo unpropagates the consequences of adding some 
set of clauses (in order), given some existing incremental context. The organization of 
the algorithm mimics the standard insertion-based MUS extraction algorithm [17], but 
the use of the incremental LTUR yields run time complexity that improves over other 
approaches. Consider the operation of the standard insertion-based algorithm [17], in 
which clauses are iteratively added to the working formula. When the formula becomes 
unsatisfiable, a transition clause [12] has been identified, which is then added to the 
MUS being constructed. The well-known query complexity of the insertion-based al¬ 
gorithm is 0{m X k) where m is the number of clauses and k is the size of a largest 
MUS. Now consider that the incremental LTUR algorithm is used. To find the first 
transition clause, the amortized run time is C)(||J^||). Clearly, this holds true for any 
transition clause, and so the run time of MUS extraction with the LTUR algorithm be¬ 
comes (!l(|Ad| X ||-U||), where Ad C I is a largest MUS. Algorithm 3 highlights the 
main differences with respect to a standard insertion-based MUS extraction algorithm. 
In contrast, observe that for a deletion-based algorithm the run time complexity will 


8 










be 0{\I\ X ||J^||). In situations where the sizes of MUSes are much smaller than the 
number of groups in X, this difference can be significant. As a result, when extracting 
MUSes from Horn formulae, and when using a polynomial time incremental decision 
procedure, an insertion-based algorithm should be used instead of other more com¬ 
monly used alternatives. 


5 Comparison with Existing Alternatives 

This section compares HgMUS with the group-MUS enumerators used inEL’^SAT [51,52], 
and SATPin [34]. The experimental comparison with these enumerators as well as other 
approaches for axiom pinpointing for the £C family of DLs is provided in Section 6. 

5.1 EL+SAT 

The best known SAT-based approach for axiom pinpointing is EL'^SAT [51,57,52]. 
EL'^SAT is composed of two main phases. The first phase compiles the axiom pinpoint¬ 
ing problem to a Horn formula. The second phase enumerates the so-called MinAs, and 
corresponds to group-MUS enumeration for this Horn formula [1].. Although existing 
references emphasize the enumeration of MinAs (MUSes) using an AllSAT approach 
(itself inspired by an AllSMT approach [28]), the connection with MUS enumeration is 
immediate [1]. More importantly, EL’^SAT shares a number of similarities with implicit 
minimal hitting set dualization, but also crucial differences, which we now analyze. 

Similar to eMUS, EL’^SAT selects subformulae of an unsatisfiable Horn formula. 
This is achieved with a SAT solver that always assigns variables value 1 when branch¬ 
ing [52]. This corresponds to solving SAT with preferences [21,48], and so it corre¬ 
sponds to computing a maximal model, inasmuch the same way as eMUS operates. 

In EL'^SAT, the approach for deciding the satisfiability of Horn subformulae is 
based on running the unit propagation engine of a CDCL SAT solver. As explained 
earlier, this can be inefficient when compared with the dedicated LTUR algorithm for 
Horn formulae [39]. Moreover, in EL'^SAT, MUSes are extracted with what can be 
viewed as a deletion-based algorithm [11,12]. Although more efficient alternatives are 
suggested, none is as asymptotically as efficient as the dedicated algorithm proposed 
in Section 4.5. 

Einally, the most important drawback is the blocking of sets of clauses that do 
not contain an MUS/MinA. In our setting of implicit minimal hitting set dualization, 
this represents one MCS. The approach used in EL’^SAT consists of creating a block¬ 
ing clause solely based on the decision variables (which are always assigned value 
1) [57,52]^. Thus, the learned clauses, although blocking one MCS (and corresponding 
MSS), do not block supersets of MCSes (and the corresponding subsets of the MSSes). 
This can result in exponentially more iterations than necessary, and explains in part the 
poor performance of EL’^SAT in practice. It should be further observed that this draw¬ 
back becomes easier to spot once the problem is described as MUS enumeration by 
implicit minimal hitting set dualization. 

^ The clause learning mechanism used in EL'^SAT is detailed in [52, page 17, first paragraph]. 
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5.2 SATPin 


SATPin [34] represents a recent S AT-based alternative for axiom pinpointing for the £C 
family of DLs, that focuses on optimizing the low-level implementation details of the 
CDCL SAT solver, including the use of incremental SAT solving. As indicated above, 
HgMUS opts to revisit instead the LTUR [39] algorithm from the late 80s, since it is 
guaranteed to run in linear time for Horn formulae, and can be implemented with small 
overhead. The SATPin approach is presented in terms of iteratively computing prime 
implicants. However, computing a prime implicant is tightly related with extracting an 
MUS [15]. As a result, some aspects of the organization of SATPin can be related with 
those of EL'^SAT, namely the procedure for extracting MUSes/MinAs. Although the 
actual enumeration of candidate sets is not detailed in [34], the description of SATPin 
suggests the use of model enumeration with some essential pruning techniques. 

6 Experimental Results 

This section evaluates group MUS enumerators for Horn formulae obtained from ax¬ 
iom pinpointing problems for the £C family of DLs, particularly applied to medical 
ontologies. A set of standard benchmarks is considered. These have been used in earlier 
work, e.g. [5,51,32,1,34]. 

Since all experiments consist of converting axiom pinpointing problems into group 
MUS enumeration problems, the tool that uses HgMUS as its back-end is named 
EL2MUS. Thus, in this section, the results for EL2MUS illustrate the performance of 
the group-MUS enumerator described in this paper. 

6.1 Experimental Setup 

Each considered instance represents the problem of explaining a particular subsump¬ 
tion relation (query) entailed in a medical ontology. Four medical ontologies^ are con¬ 
sidered; GALEN [46], GENE [2], NCI [53] and SNOMED CT [56]. For GALEN, we 
consider two variants: FULL-GALEN and NOT-GALEN. The most important ontology 
is SNOMED CT and, due to its huge size, it also produces the hardest axiom pinpointing 
instances. For each ontology (including the GALEN variants) 100 queries are consid¬ 
ered; 50 random (which are expected to be easier) and 50 sorted (expected to have a 
large number of minimal explanations) queries. So, there are 500 queries in total. Given 
an ontology and a subsumption query, the encoding proposed in [51,52] produces a 
Horn formula and a set of axioms (variables) which may be responsible for the sub¬ 
sumption relation. This can be transformed into a group-MUS enumeration problem 
where the original Horn formula forms group-0 and each axiom constitutes a group 
containing only a unit clause. 

Two different experiments were considered by applying two different simplihcation 
techniques to the problem instances, both of which were proposed in [52]. The first one 

^ GENE, GALEN and NCI ontologies are freely available at http://lat.inf. 
tu-dresden . de/-meng/toyont . html. The SNOMED CT ontology was requested 
from IHTSDO under a nondisclosure license agreement. 
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Fig. 1: Cactus plot comparing EL'^SAT, SATPin and EL2MUS on the COI instances 


uses the Cone-Of-Influence (COI) reduction. These are reduced instances in both the 
size of the Horn formula and the number of axioms, but are still quite large. Similar 
techniques are exploited in related work [5,32,34]. The second one considers the more 
effective reduction technique (which we refer to as x2), consisting in applying the COI 
technique, re-encoding the Horn formula into a reduced ontology, and encoding this 
ontology again into a Horn formula. This results in small Horn formulae, which will be 
useful to evaluate the algorithms when there are a large number of MUSes/MCSes. 

The experiments compare EL2MUS to different algorithms, namely EL'^SAT [51,52], 
CEL [5], Just [32] and SATPin [34]. EL’^SAT [52] has been shown to outperform 
CEL [5], whereas SATPin [34] has been shown to outperform the MUS enumerator 
MARCO [30]. 

The comparison with CEL and JuST imposes a number of constraints. Eirst, CEL 
only computes 10 MinAs, so all comparisons with CEL only consider reporting the hrst 
10 MinAs/MUSes. Also, CEL uses a simplification technique similar to COI, so CEL is 
considered in the first experiments. Second, JuST operates on selected subsets of £C^, 
i.e. the description logic used in most medical ontologies. As a result, all comparisons 
with Just consider solely the problem instances for which JuST can compute correct 
results. Just accepts the simplihed x2 ontologies, so it is considered in the second 
experiments. The comparison with these tools is presented at the end of the section. 

EL2MUS interfaces the SAT solver Minisat 2.2 [19] for computing maximal mod¬ 
els. All the experiments were performed on a Linux cluster (2 GHz) and the algorithms 
were given a time limit of 3600s and a memory limit of 4 GB. 

6.2 COI Instances 

Eigure 1 summarizes the results for EL’^SAT, SATPin and EL2MUS. EL'^SAT does not 
show in the plot due to its poor performance. As can be observed, EL2MUS terminates 
for more instances than any of the other tools. Eigure 2 shows scatter plots comparing 
the different tools. As can be concluded, and with a few outliers, the performance of 
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(c) Comparison with CEL Summary table 

Fig. 2: Scatter plots for COI instances 


EL2MUS exceeds the performance of any of the other tools by at least one order of 
magnitude (and often by more). Figure 2d summarizes the results in the scatter plots, 
where the percentages shown are computed for problem instances for which at least one 
of the tools takes more than 0.001s. CEL is not shown in the table due to the special 
constraints mentioned above. As can be observed, EL2MUS outperforms any of the 
other tools in all of the problem instances and, for many cases, with two or more orders 
of magnitude improvement. 

6.3 x2 Instances 

The x2 instances are significantly simpler than the COI instances. Thus, whereas the 
COI instances can serve to assess the scalability of each approach, the x2 instances 
highlight the expected performance in representative settings. Figure 3a summarizes the 
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Fig. 3: Cactus plots comparing EL’^SAT, SATPin and EL2MUS on the x2 instances 


performance of the tools EL’^SAT, SATPin and EL2MUS. As with the COI instances, 
EL'^SAT does not show in the plot due to its poor performance. Moreover, and as before 
in terms of terminated instances, EL2MUS exhibits an observable performance edge. 

A pairwise comparison between the different tools is summarized in Eigure 4. Al¬ 
though not as impressive as for the COI instances, EL2MUS still consistently outper¬ 
forms all other tools. Eigure 4d summarizes the results, where as before the percentages 
shown are computed for problem instances for which at least one of the tools takes 
more than 0.001s. Observe that, for these easier instances, SATPin becomes competi¬ 
tive with EL2MUS. Nevertheless, for instances taking more than 0.1s, EL2MUS out¬ 
performs SATPin on 100% of the instances. Thus, the 67.69% shown in the table result 
from instances for which both SATPin and EL2MUS take at most 0.04s. The sum¬ 
mary table also lists the number of computed MUSes for the 19 instances for which 
EL2MUS does not terminate (all of the other tools also do not terminate for these 19 in¬ 
stances). EL2MUS computes 9948 MUSes in total. As can be observed from the table, 
the other tools lag behind, and compute significantly fewer MUSes. The comparison 
with EL'^SAT and SATPin, reveals that EL2MUS computes respectively in excess of a 
factor of 10 and of 5 more MUSes. 

EL2MUS not only terminates on more instances than any other approach and com¬ 
putes more MUSes for the unsolved instances; it also reports the sequences of MUSes 
much faster. Eigure 3b shows, for each computed MUS over the whole set of instances, 
the time each MUS was reported. This figure compares EL'^SAT, SATPin and EL2MUS, 
as these are the only methods able to report MUSes from the beginning. The results con¬ 
firm that EL2MUS is able to find many more MUSes in less time than the alternatives. 

These experimental results suggest that, not only is EL2MUS the best performing 
axiom pinpointing tool, on both the COI and x2 problem instances, but it is also the one 
that is expected to scale better for more challenging problem instances, given the results 
on the COI instances. 
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(b) Comparison with SATPin 
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Fig. 4: Scatter plots for x2 instances 


(d) Summary table 


6.4 Assessment of Non SAT-Based Axiom Pinpointing Tools 

Figure 2c and Figure 4c show scatter plots comparing EL2MUS with CEL [5] and 
Just [32], respectively for the COI and x2 instances'^. As indicated earlier, CEL only 
computes 10 MinAs, and so the run times shown are for computing the first 10 MinAs. 
As can be observed, the performance edge of EL2MUS is clear, with the performance 
gap exceeding 1 order of magnitude almost without exception. Moreover, JuST [32] is 
a recent state of the art axiom pinpointing tool for the less expressive ECU DL. Thus, 
not all subsumption relations can be represented and analyzed. The results shown are 
for the subsumption relations for which JuST gives the correct results. In total, 382 
instances could be considered and are shown in the plot. As before, the performance 

* The other scatter plots are not shown, but the conclusions are the same. 
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edge of EL2MUS is clear, with the performance gap exceeding 1 order of magnitude 
without exception. In this case, since the x2 instances are in general much simpler, the 
performance gap is even more significant. 

7 Conclusions & Future Work 

Enumeration of group MUS for Horn formulae finds important applications, including 
axiom pinpointing for the ££ family of DLs. Since the £C family of DLs is widely used 
for representing medical ontologies, namely with £C^, enumeration of group MUSes 
for Horn formulae represents a promising and strategic application of SAT technology. 
This includes, among others, SAT solvers, MCS extractors and enumerators, and MUS 
extractors and enumerators. This paper develops a highly optimized group MUS enu¬ 
merator for Horn formulae, which is shown to extensively outperform any other existing 
approach. Performance gains are almost without exception at least one order of magni¬ 
tude, and most often significantly more than that. More importantly, the experimental 
results demonstrate that SAT-based approaches are by far the most effective approaches 
for axiom pinpointing for the ££, family of DLs. When compared with other non SAT- 
based approaches, the performance gains are also conclusive. 

Euture work will exploit integration of additional recent work on SAT-based prob¬ 
lem solving, e.g. in MCS enumeration and MUS enumeration, to further improve per¬ 
formance of axiom pinpointing. 
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